AITopics | ridge regularization

We consider $L^2$-regularized linear (ridge) regression over a finite data sample $X$ with bounded covariance and linear prediction targets $y$ with additive isotropic noise of finite variance. We present an iterative procedure to compute the optimal regularization strength numerically from the generative parameters in the fixed-$X$ setting and prove its convergence at limited noise levels. Our experimental evaluation over synthetic data shows that the proposed procedure combined with sample-based parameter estimates attains near-optimal random-$X$ generalization across a wide range of sample sizes, aspect ratios, and noise levels, at an added computational cost equivalent to one preliminary ridge regression in the underparameterized regime and two in the overparameterized case.

artificial intelligence, machine learning, regularization, (16 more...)

arXiv.org Machine Learning

2605.28679

Country: North America > United States (0.67)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

f976982cd1c1b9e076c096787ef6652e-Paper-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 08:54:35 GMT

data mining, equivalence, machine learning, (19 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Data Science > Data Mining (0.93)

Add feedback

f976982cd1c1b9e076c096787ef6652e-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 01:40:01 GMT

data mining, equivalence, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Data Science > Data Mining (0.93)

Add feedback

c4f2c88e16a579900657c18726641c81-Paper.pdf

Neural Information Processing SystemsFeb-11-2026, 02:13:50 GMT

estimator, regression, robust risk, (16 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.49)

Add feedback

Basic Inequalities for First-Order Optimization with Applications to Statistical Risk Analysis

Paik, Seunghoon, Zhou, Kangjie, Telgarsky, Matus, Tibshirani, Ryan J.

arXiv.org Machine LearningJan-1-2026

We introduce \textit{basic inequalities} for first-order iterative optimization algorithms, forming a simple and versatile framework that connects implicit and explicit regularization. While related inequalities appear in the literature, we isolate and highlight a specific form and develop it as a well-rounded tool for statistical analysis. Let $f$ denote the objective function to be optimized. Given a first-order iterative algorithm initialized at $θ_0$ with current iterate $θ_T$, the basic inequality upper bounds $f(θ_T)-f(z)$ for any reference point $z$ in terms of the accumulated step sizes and the distances between $θ_0$, $θ_T$, and $z$. The bound translates the number of iterations into an effective regularization coefficient in the loss function. We demonstrate this framework through analyses of training dynamics and prediction risk bounds. In addition to revisiting and refining known results on gradient descent, we provide new results for mirror descent with Bregman divergence projection, for generalized linear models trained by gradient descent and exponentiated gradient descent, and for randomized predictors. We illustrate and supplement these theoretical findings with experiments on generalized linear models.

artificial intelligence, gradient descent, machine learning, (18 more...)

arXiv.org Machine Learning

2512.24999

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.88)

Industry: Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.77)

Add feedback

Generalized equivalences between subsampling and ridge regularization

Neural Information Processing SystemsDec-27-2025, 06:19:25 GMT

We establish precise structural and risk equivalences between subsampling and ridge regularization for ensemble ridge estimators. Specifically, we prove that linear and quadratic functionals of subsample ridge estimators, when fitted with different ridge regularization levels $\lambda$ and subsample aspect ratios $\psi$, are asymptotically equivalent along specific paths in the $(\lambda,\psi)$-plane (where $\psi$ is the ratio of the feature dimension to the subsample size). Our results only require bounded moment assumptions on feature and response distributions and allow for arbitrary joint distributions. Furthermore, we provide a data-dependent method to determine the equivalent paths of $(\lambda,\psi)$. An indirect implication of our equivalences is that optimally tuned ridge regression exhibits a monotonic prediction risk in the data aspect ratio. This resolves a recent open problem raised by Nakkiran et al. for general data distributions under proportional asymptotics, assuming a mild regularity condition that maintains regression hardness through linearized signal-to-noise ratios.

generalized equivalence, name change, ridge regularization, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.41)

Add feedback

Interpolation can hurt robust generalization even when there is no noise

Neural Information Processing SystemsDec-24-2025, 21:27:14 GMT

Numerous recent works show that overparameterization implicitly reduces variance for min-norm interpolators and max-margin classifiers. These findings suggest that ridge regularization has vanishing benefits in high dimensions. We challenge this narrative by showing that, even in the absence of noise, avoiding interpolation through ridge regularization can significantly improve generalization. We prove this phenomenon for the robust risk of both linear regression and classification, and hence provide the first theoretical result on \emph{robust overfitting}.

interpolation, name change, noise, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.82)

Add feedback

c4f2c88e16a579900657c18726641c81-Paper.pdf

Neural Information Processing SystemsAug-17-2025, 06:59:18 GMT

artificial intelligence, estimator, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.49)

Add feedback

Generalized equivalences between subsampling and ridge regularization

Neural Information Processing SystemsJan-20-2025, 02:58:34 GMT

We establish precise structural and risk equivalences between subsampling and ridge regularization for ensemble ridge estimators. Specifically, we prove that linear and quadratic functionals of subsample ridge estimators, when fitted with different ridge regularization levels \lambda and subsample aspect ratios \psi, are asymptotically equivalent along specific paths in the (\lambda,\psi) -plane (where \psi is the ratio of the feature dimension to the subsample size). Our results only require bounded moment assumptions on feature and response distributions and allow for arbitrary joint distributions. Furthermore, we provide a data-dependent method to determine the equivalent paths of (\lambda,\psi) . An indirect implication of our equivalences is that optimally tuned ridge regression exhibits a monotonic prediction risk in the data aspect ratio. This resolves a recent open problem raised by Nakkiran et al. for general data distributions under proportional asymptotics, assuming a mild regularity condition that maintains regression hardness through linearized signal-to-noise ratios.

equivalence, generalized equivalence, ridge regularization, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.45)

Add feedback

Interpolation can hurt robust generalization even when there is no noise

Neural Information Processing SystemsJan-19-2025, 02:18:06 GMT

Numerous recent works show that overparameterization implicitly reduces variance for min-norm interpolators and max-margin classifiers. These findings suggest that ridge regularization has vanishing benefits in high dimensions. We challenge this narrative by showing that, even in the absence of noise, avoiding interpolation through ridge regularization can significantly improve generalization. We prove this phenomenon for the robust risk of both linear regression and classification, and hence provide the first theoretical result on \emph{robust overfitting}.

interpolation, noise, ridge regularization

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.98)

Add feedback

Filters

Collaborating Authors

ridge regularization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Optimal ridge regularization revisited

f976982cd1c1b9e076c096787ef6652e-Paper-Conference.pdf

f976982cd1c1b9e076c096787ef6652e-Paper-Conference.pdf

c4f2c88e16a579900657c18726641c81-Paper.pdf

Basic Inequalities for First-Order Optimization with Applications to Statistical Risk Analysis

Generalized equivalences between subsampling and ridge regularization

Interpolation can hurt robust generalization even when there is no noise

c4f2c88e16a579900657c18726641c81-Paper.pdf

Generalized equivalences between subsampling and ridge regularization

Interpolation can hurt robust generalization even when there is no noise